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The clustering of Lya emitters in a ACDM Universe 
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ABSTRACT 

We combine a semi-analytical model of galaxy formation with a very large simulation which 
follows the growth of large scale structure in a ACDM universe to predict the clustering of 
Lya emitters. We find that the clustering strength of Lya emitters has only a weak dependence 
on Lya luminosity but a strong dependence on redshift. With increasing redshift, Lya emitters 
trace progressively rarer, higher density regions of the universe. Due to the large volume 
of the simulation, over 100 times bigger than any previously used for this application, we 
can construct mock catalogues of Lya emitters and study the sample variance of current 
and forthcoming surveys. We find that the number and clustering of Lya emitters in our 
mock catalogues are in agreement with measurements from current surveys, but that there is a 
considerable scatter in these quantities. We argue that a proposed survey of emitters at z = 8.8 
should be extended significantly in solid angle to allow a robust measurement of Lya emitter 
clustering. 
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1 INTRODUCTION 

The study of galaxies at high redshifts opens an important window 
on the process of galaxy formation and conditions in the early uni- 
verse. The detection of populations of galaxies at high redshifts is 
one of the great challenges in observational cosmology. Currently 
three main observational techniques are used to discover high red- 
shift, star-forming galaxies: (i) The Lyman-break drop-out tech- 
nique, in which a galaxy is imaged in a combination of three or 
more optical or near-IR bands. The longer wavelength filters de- 
tect emission in the rest-frame ultraviolet from ongoing star for- 
mation, whereas the shorter wavelength filters sample the Lyman- 
break feature. Hence, a Ly man-break galaxy appear s blue in one 
colour and red in the other JSteidel et aTlll996l , Tl999h . By shifting 
the whole filter set to longer wavelengths, the Lyman-break fea- 
ture can be isolated at higher redshifts; (ii) Sub-milli metre emis- 
sion, due to dust being hea ted when it absorbs starlight iSmail et al.l 
ll997l:lHughes et aljl998h . The bulk of the energy absorbed by the 
dust comes from the rest-frame ultra-violet and so the dust emis- 
sion is sensitive to the instantaneous star formation rate; (iii) Ly- 
q line emission from star formin g galaxies, ty pically identified 
using either narrowband imaging IHu. Cow ie & McMahon 1998; 
IKudritzki et"afl |200C| ; IGawiser et al.l |2007| ; lOuchi et all 120071) or 
long-slit spectroscopy of gravitationally len sed regions JEllis et al.l 
1200 it ISantos et alj|2004 Istark et alj|2007ri . The Ly-a emission is 
driven by the production of Lyman-continuum photons and so is 
dependent on the current star formation rate. 
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The Lyman-break drop-out and sub-millimetre detection 
methods are more established than Ly-a emission as a means of 
identifying substantial populations of high redshift galaxies. Nev- 
ertheless, in the last few years there have been a number of Lya 
surveys which have successfully found high redshift galaxies e.g. 
JHu, Cowie & McMahorjl 19981 ; IKudritzki et al.ll200d) . The obser- 
vational samples have grown in size such that statistical studies 
of the properties of Lya e mitters have now become possible: for 
example, the SXDS Survey l lOuchi et alj|20 05. 2007) has allowed 
estimates of the luminosity function (LFs) and clustering of Lya 
emitte rs in the redshift range 3 < z < 6, an d the MUSYC sur- 
vey iGronwall et alj|2007l : IGawiser et alj|2007r) has also produced 
clustering measurements at z ~ 3. Furthermore, the highest red- 
shift galaxy (z — 6.96 ) robustly dete cted to date was found us- 
ing the Lya technique jive et alJuOOq) . Taking advan tage of the 
magnification of faint sources by gravitational lensing, Istark et al.l 
(2007) reported 6 candidates for Lya emitters in the redshift range 
8.7 < z < 10.2, but these ha ve yet to be confirmed. The DA- 
zLE Project Morton et alj|2004f ) is designed to find Lya emitters 
at z — 7.73 and z — 8.78. However, the small field of view of 
the instrument (6.83' x 6.83') makes it difficult to use to study 
large scale structu re (LSS) at su ch redshifts. On the other hand, 
the ELVIS Survey JNilsson et al.ll2007 al ibi) would appear to offer a 
more promising route to study the LSS of very high redshift galax- 
ies (z = 8.8). 

Despite these observational breakthroughs, predictions of the 
properties of star-forming Lya emitting galaxies are still in the 
relatively early stages of development. Often these calculations 
employ crude assumptions about the galaxy formation process 
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to derive a star formation rate and hence a Lya luminosity, or 
use hydrodynamical simulations, which, due to the high compu- 
tational overhead, study relatively small cosmological volumes. 
lHaiman & Spaansl d 19990 made predictions for the escape frac- 
tion of Lya emission and the abundance of Lya emitters using 
the Press-Schechter formalism and a prescription for the dust dis- 
tribution in galaxies. Radiative transfer calculations of the escape 

i n j n I — L — I 

fraction have been made by Zheng & Miralda-Escude ( 2002), Ahn 

(2004) and Verha mme et al . (2006) for ideal ized geometri es, while 
iTasitsiomil (2006) and lLaursen & Sommer-Larsenl J2007D applied 
these calculations to galaxies taken from cosmological hydrody- 
namical simulations. Barton et al (2004) and. Furlanetto et all (2005) 
calculated the number density of Ly-a emitters using hydrody- 
namical simulations of galaxy formation. Nagamine et al. (2006, 
2008) used hydrodynamical simulations to predict the abundance 
and clustering of Lya emitters. The typical computational boxes 
used in these calculations are very small (~ 10 — 30/t -1 Mpc), 
which makes it impossible to evolve the simulation accurately to 
z — 0. Hence, it is difficult to test if the galaxy formation model 
adopted produces a reasonable description of present day galaxies. 
Furthermore, the small box size means that reliable clustering pre- 
dictions can only be obtained on scales smaller than the typical cor- 
relation length of the galaxy sample. As we will show in this paper, 
small volumes are subject to significant fluctuations in clustering 
amplitude. 

The semi-analytical approach to modelling galaxy formation 
allows us to make substantial improvements over previous calcula- 
tions of the properties of Lya emitters. The speed of this technique 
means that large populations of galaxies can be followed. The range 
of predictions which can be made using semi-analytical models is, 
in general, broader than that produced from most hydrodynamical 
simulations, so that the model predictions can be compared more 
directly with observational results. A key advantage is that the mod- 
els can be readily evolved to the present-day, giving us more faith 
in the ingredients used; i.e. we can be reassured that the physics un- 
derpinning the predictions presented for a high-redshift population 
of galaxies would not result in too many bright/massive galaxies at 
the present day. 

The first semi-analytical calculation of the properties of Lya 
emitters based on a hierarch i cal m odel of galaxy formation was car- 
ried out bv lLe Delliou et al.l J2005I) . This is the model used through- 
out this work, which has been shown to be successful in predicting 
the properties of Lya emitters over a wide range of redshifts. The 
semi-analytical m odel allows us to connec t Lya emission to other 
galaxy properties. iLe Delliou et al.l ([2006) showed that this model 
succesfully predicts the observed Lya LFs and equivalent widths 
(EWs), along with some fundamental physical properties, such as 
star formati on rates (SFRs), gas metallicities, and stellar and halo 
masses. In lNilsson et alj 1 12007 al) . we used the model to make fur- 
ther predictions for the LF of very high redshift Lya emitters and 
to study the feasibility of current and fort hcoming surveys which 
aim to detect such high redshift galaxies. iKobavashi et alj (2007) 
developed an independent semi-analytical model to derive the lu- 
minosity functions of Lya emitters. 

The focus of this paper is to use the model introduced by 
ILe Delliou et alj (2005) to study the clustering of high-redshift Lya 
emitting galaxies and to extend the comparison of model predic- 
tions with current observational data. Le Delliou et al. ( 2006) al- 
ready gave an indirect prediction of the clustering of Lya emitters 
by studying galaxy bias as a function of Lya luminosity. How- 
ever, these results depend on an analytical model for the halo bias 
dSheth. Mo & Tor men 2001), and furthermore the linear bias as- 



sumption breaks down on small scales. Here we will present an 
explicit calculation of the clustering of galaxies by implementing 
the semi-analytical model on top of a large N-body simulation of 
the hierarchical clustering of the dark matter distribution. This al- 
lows us to predict the spatial distribution of Lya emitting galaxies, 
and to create realistic maps of Lya emitters at different redshifts. 
These maps can be analysed with simple statistical tools to quantify 
the spatial distribution and clustering of galaxies at high redshifts. 
The N-body simulation used in this work is the Millennium Simu- 
lation, carried out by the Virgo Consortium JSpringel et al. 2005). 
The simulation of the spatial distribution of Lya emitters is tested 
by creating mock catalogues for different surveys of Lya emitting 
galaxies in the range 3 < z < 9. The clustering of Lya emit- 
ters in our model is analysed with correlation functions and halo 
occupation distributions. Taking advantage of the large volume of 
the Millennium simulation, we also compute the errors expected 
on correlation function measurements from various surveys due to 
cosmic variance. 

The outline of this paper is as follows: Section 2 gives a brief 
description of the semi-analytical galaxy formation model and de- 
scribes how it is combined with the N-body simulation. In Section 
3 we establish the range of validity of our simulated galaxy samples 
by studying the completeness fractions in the model Lya luminos- 
ity functions. Section 4 gives our predictions for the clustering of 
Lya emitters in the range < z < 9. In Section 5 we compare our 
simulation with recent observational data and we also make pre- 
dictions for future measurements (clustering and number counts) 
expected from the ELVIS Survey. Finally, Section 6 gives our con- 
clusions. 



2 THE MODEL 

We use the semi-analytical model of galaxy formation, GALFORM, 
to predict the properties of the Lya emission of galaxies and their 
abundance as a function of r edshift. The GALFORM model is fully 
described in Cole et al (2000) (see also the review by Baugh 2006) 
and th e variant used he re was introduced bv lBaugh et al (2005) (see 
also L acev et al.l l2008. for a more detailed description). The model 
computes star formation histories for the whole galaxy population, 
following the hierarchical evolution of the host dark matter haloes. 

The merger histories of dark matter haloes are calculated us- 
ing a Monte Carlo method, follow ing the formalism of the extended 
Press & Schechter theory dPress & Schechtej[l974l ; lLacev & Cold 
Il993l) . When using Monte Carlo merger trees, the mass resolution 
of dark matter haloes can be arbitrarily high, since the whole of the 
computer memory can be devoted to one tree rather than a popu- 
lation of trees. In contrast, N-body merger trees are constrained by 
a finite mass resolution due to the particle mass, which is usually 
poorer than that typically adopted for Monte Carlo merger trees. 
Discrepancies between the model predictions obtained with Monte 
Carlo trees and those extracted from a simulation only become ev- 
ident fainter than some luminosity which is set by the mass resolu- 
tion of the N-bod y trees, as we will see in the next section (see also 
iHellv et alj2003l) . 

A critical assumption of the Baugh et al. model is that stars 
formed in starbursts have a top-heavy initial mass function (IMF), 
where the IMF is given by dN/dln(m) oc m~ x and x = 0. Stars 
formed quiescently in discs have a sola r neighbourhood IMF, with 
the form proposed by iKennicutl dl983T) : x = 0.4 for m < IMq 
and x — 1.5 for m > IMq. Both IMFs cover the mass range 
0.15M© < m < 125M©. Within the framework of ACDM, 
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Figure 1. The spatial distribution of Lya emitting galaxies (coloured circles) in a slice from the Millennium simulation, with the dark matter distribution 
in green. The four panels are for redshifts in the range < z < 8.5, as indicated in each panel. The colour of the circles changes with the Lya luminosity 
of the galaxies, as shown in the key in the upper-right corner of the first panel. Only galaxies brighter than log(LLya [erg s _1 h~ 2 ]) = 42.2 are plotted. 
Each image covers a square region 100 X 100h _1 Mpc across and having a depth of 10/i _1 Mpc, which is less than one thousandth the volume of the full 
simulation box. 



Baugh et al. argued that the top-heavy IMF is essential to match 
the counts and redshift distribution of galaxies detected through 
their sub-millimetre emission, whilst retaining the match to galaxy 
properties in the local Universe, such as the optical and far-IR 
luminosity funct i ons an d galaxy gas fractions and metallicities. 
iNagashima et alj d2005al lbf) showed that such a top-heavy IMF also 
results in predictions for the metal abundances in the intra-cluster 
medium and in elliptical galaxi es in much better agreement with 
observations. lLacey et alj 120081) showed that the same model pre- 
dicts galaxy evolution in the IR in good agreement with observa- 



tions from Spitzer, and also discussed independent observational 
evidence for a top-heavy IMF. 

Reionization is assumed in our mod el to occur at z rc ion = 10 
JKogut et al.ll2003l ; iDunklev et alj|2008l) . The photoionization of 
the intergalactic medium (IGM) is assumed to suppress the collapse 
and cooling of gas in hal oes with circular vel ocities V c < 60km/s 
at redshifts z < z,.<.inn l Benson et al. 20021). Recent calculations 



dHoeft et al.|[2006l ; lOkamoto. Gap & Theunsll2008h imply that the 
above parameter values overstate the impact of photoionization on 
gas cooling, and suggest that photoionization only affects smaller 
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haloes with V c < 30km/s. Our model predictions, for the range 
of Lya luminosities we consider in this paper, are not significantly 
affected by adopting the lower V c cut. We neglect any attenuation 
of the Lya flux by propagation through the IGM. This effect would 
suppress the observed Lya flux mainly for z > z r cion, so this 
should only affect our results for very high redshifts (z > 10). 

The model used to predict the luminosities and equivalent 
widths of the Lya galaxie s is identical to that described in 
iLe Delliou et al.l (2005, 2006). The Lya emission is computed by 
the following procedure: (i) The integrated stellar spectrum of the 
galaxy is calculated, based on its star formation history, including 
the effects of the distribution of stellar metallicities, and taking into 
account the IMFs adopted for different modes of star formation, 
(ii) The rate of production of Lyman continuum (Lye) photons is 
computed by integrating over the stellar spectrum, and assuming 
that all of these ionizing photons are absorbed by neutral hydro- 
gen within the galaxy. We calculate the fraction of Lya photons 
produced by thes e Lye photons, assuming Case B recombination 
dOsterbrocklll989h . 

(iii) The observed Lya flux depends on the fraction of Lya 
photons which escape from the galaxy (/ CS c), which is assumed to 
be constant and independent of galaxy properties. 

Calculating the Lya escape fraction from first principles by 
following the radiative transfer of the Lya photons is very demand- 
ing computationally. A more complete calculation of the escape 
fraction would have into account the structure and kine matic prop- 
erties of the intestellar medium (ISM) dZheng & Miralda-Escuda 
120021 ; lAhnll2004l ; IVerhamme et al.ll2006h . In our model, we adopt 
the simplest possible approach, which is to fix the escape frac- 
tion, / csc , to be the same for each galaxy, without taking into ac- 
count its dust properties. This results in a surprisingly good agree- 
ment between the predicted number counts and luminosity func- 
tio ns of emitters and the availa b le observations a t 3 < z < 
7 JLe Delliou etal] 12001 l2006h . ILe Delliou et"all feOOSI) chose 
/csc = 0.02 to match the number counts at z ~ 3 at a flux 
/ ~ 2 x 10 _17 ergcm _2 s _1 . The same value is used in this 
work. This value for the Lya escape fraction seems very small, 
but is co nsistent with direct observational estimates for low redshift 
galaxies: Atek et al. (2008) derive escape fractions for a sample of 
nearby star-forming galaxies by combining measurements of Lya, 
Ha and H/3, and find th at most have escape fractions of 3% or less. 
ILe Delliou et alj (2006) also showed that if a standard solar neigh- 
bourhhod IMF is adopted for all modes of star formation, then a 
substantially larger escape fraction would be required to match the 
observed counts of Lya emitters, and even then the overall match 
would not be as quite good as it is when the top-heavy IMF is used 
in bursts. 

Once we obtain the galaxy properties from the semi-analytical 
model, we plant these galaxies into a N-body simulation, in order to 
add information about their positions an d velocities. The simu lation 
used here is the Millennium Simulation JSpringel et alj .2005). This 
simulation adopts concordance values for the parameters of a flat 
ACDM model, H m = 0.25 and fi b = 0.045 for the densities of 
matter and baryons at 2 = 0, h = 0.73 for the present-day value 
of the dimensionless Hubble constant, as = 0.9 for the rms linear 
mass fluctuations in a sphere of radius 8h _1 Mpc at z — and 
n = 1 for the slope of the primordial fluctuation spectrum. The 
simulation follows 2160 3 dark matter particles from z — 127 to 
z — within a cubic region of comoving length 500h _1 Mpc. The 
individual particle mass is 8.6 x 10 h - M©, so the smallest dark 
halo which can be resolved has a mass of 2 x lO lo h _1 M0. 

Dark matter haloes are identified using a Friends-Of -Friends 



(FOF) algorithm. To populate the simulation with galaxies from 
the semi-analytical model, we use the same approach as in 
iBenson et a l. ( 2000). First, the position and velocity of the centre of 
mass of each halo is recorded, along with the positions and veloc- 
ities of a set of randomly selected dark matter particles from each 
halo. Second, the list of halo masses is fed into the semi-analytical 
model in order to produce a population of galaxies associated with 
each halo. Each galaxy is assigned a position and velocity within 
the halo. Since the semi-analytical model distinguishes between 
central and satellite galaxies, the central galaxy is placed at the 
centre of mass of the halo, and any satellite galaxy is placed on 
one of the randomly selected halo particles. Once galaxies have 
been generated, and positions and velocities have been assigned, it 
is a simple process to produce catalogues of galaxies with spatial 
information and any desired selection criteria. 

The combination of the semi-analytical model with the N- 
body simulation is essential to study the detailed clustering of a de- 
sired galaxy population, although the clustering amplit ude on larg e 
scales can also be estimated analytically ( Le Delliou e t al. 2006). 
An example of the output of the simulation is shown in the four 
images of Fig. Q] which show redshifts z = 0, z — 3.3, z = 5.7 
and z — 8.5. The dark matter distribution (shown in green) be- 
comes smoother as we go to higher redshifts, due to the gravita- 
tional growth of structures. As shown in Fig.[T] for this particular 
luminosity cut, the number density of Lya emitters varies at differ- 
ent redshifts. As we will show in the next section, these catalogues 
at high redshift are not complete at faint luminosities, so we have to 
restrict our predictions to brighter luminosities as we go to higher 
redshifts. 



3 LUMINOSITY FUNCTIONS 



The model presented by ILe Delliou et alj a2005l . I2006T) differs in 
two main ways from the one presented in this paper: (i) there is a 
slight difference in the values of the cosmological parameters used, 
and (ii) the earlier work used a grid of halo masses together with 
an analytical halo mass function, rather than the set of haloes from 
an N-body simulation. In §3.1, we investigate the impact of the dif- 
ferent choice of cosmological parameters on the luminosity func- 
tion of Lya emitters, to se e if the very good agree ment with ob- 
servational data obtained bv lLe Delliou et al . (2006) is retained on 
adopting the Millennium cosmology. In §3.2, we assess the com- 
pleteness of our samples of Lya emitters due to the finite mass 
resolution of the Millennium simulation. 



3.1 Comparison of model predictions with observed 
luminosity functions 

In this section, we investigate the impact on the model predictions 
of the choice of cosmological p arameters by re-running the model 
of lLe Delliou et alj J2005I .I2006). keeping the galaxy formation pa- 
rameters the same but changing the cosmological parameters to 
mat ch those used in the Mil lennium simulation. To recap, the orig- 
inal |LeDeiIiouetai] {2003) model used ft m = 0.3, fi A = 0.7, 
fib = 0.04 , as = 0.93 and h = 0.7. In Fig. [2] we com- 
pare the cumulative luminosity functions obtained with GALFORM 
for the two sets of cosmological parameters with current observa- 
tional data in the r edshift range 3 < z < 7. The observational 
data is taken from: Kudritzki et al. (2000) (crosses), Cowie & Hu 



1998 ) (asterisks). JGawiser et all fc007l) (diamonds). lOuchi et all 
(2007) (triangles and squares) in the z = 3.3 panel: lAiiki et alj 
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Figure 2. The cumulative luminosity functions of Lyo emitters at redshifts 
z = 3.3 (Top), z = 5.7 (Center) and z = 6.7 (Bottom). The blue points 
correspond to observational data (as indicated by the key with full refer- 
ences in the text). The black and red curves correspond, respectively, to 
the GALFORM predictions using the cosmological parameters of the Mil- 
lennium Simulation and those adopted in Le Delliou et al. 



J2003h (p luses).lMaier et aljj2003b (asteri sks).lHu et al.N2004l) (di- 
amonds), iRhoads et al.l (120031) (tri angles'). IShimasaku et al.l J2006I) 
(squares) and Ouchi et al. (2007) (crosse s) in the z = 5.7 panel ; 
andlTaniguchi et al. (2005) (crosses) and iKashikawa et al.l (2006) 
(asterisks and diamonds) in the z = 6.7 panel. At z — 3.3, the 
two model curves agree very well, and are consistent with the ob- 
servational data shown. At z = 5.72, the two models do not match 
as well as in the previous case, but both are still consistent with the 
observational data. Finally, at 2 = 6.7 the differences are small and 
both curves are consistent with observational data. The conclusion 
from Fig. [2] is that there is not a significant change in the model 
predictions on using these slightly different values of the cosmo- 
logical parameters. Furthermore, the observational data is not yet 
sufficiently accurate to distinguish between the two models or to 
motivate the introduction of further modifications to improve the 
level of agreement, such as using a different Ly« escape fraction. 

3.2 The completeness of the Millennium galaxy catalogues 

The Millennium simulation has a halo mass resolution limit of 
1.72 x 10 10 h~ 1 M e . In a standard GALFORM run, a grid of haloes 
which extends to lower mass haloes than the Millennium resolution 
is typically used, with M lm = 5 x 10 9 h~ 1 M Q at z = 0. A fixed 
dynamic range in halo mass is adopted in these runs, but with the 
mass resolution shifting to smaller masses with increasing redshift: 
for our standard setup, we have M Tee = 7.8 x 10 7 /i _1 Mq and 
1.4 x 10 7 hT 1 Mq at z = 3 and 6 respectively. Therefore, when 
putting GALFORM galaxies into the Millennium, our sample does 
not contain galaxies which formed in haloes with masses below the 
resolution limit of the Millennium. This introduces an incomplete- 
ness into our catalogues when compared to the original GALFORM 
prediction. The incompleteness of the galaxy catalogues is more 
severe for low luminosity galaxies because they tend to be hosted 
by low mass haloes, as will be shown in the next section. Here- 
after, we will use N-body sample to refer to the GALFORM galaxies 
planted in the Millennium haloes, to distinguish them from the pure 
GALFORM catalogues generated using a grid of halos masses. 

In order to quantify the incompleteness of the N-body sample 
as a function of luminosity, we define the completeness fraction as 
the ratio of the cumulative luminosity function for the N-body sam- 
ple to that obtained for a pure GALFORM calculation, and look for 
the luminosity at which the completeness fraction deviates from 
unity. The panels of Fig. [3] give different views of the complete- 
ness of the N-body samples. The top panel shows the luminosity 
above which a catalogue can be considered as complete: we define 
the completeness limit as the luminosity at which the complete- 
ness fraction first drops to 0.85. The figure clearly shows how the 
luminosity corresponding to this completeness limit becomes pro- 
gressively brighter as we move to higher redshifts. For z > 9 the 
N-body sample is incomplete at all luminosities plotted. 

The bottom panel of Fig. [3] shows how the sample becomes 
more incomplete at any redshift as we consider fainter fluxes. A 
sample with galaxies brighter than log(FL ya [erg s -1 cm -2 ]) = 

— 19) is less than 70% complete at all redshifts z > 5, while a 
sample with galaxies brighter than log(i<Lya[erg s" 1 cm -2 ]) = 

— 17 is always over 90% complete for z < 9. The completeness 
fraction monotonically decreases with increasing redshift until z ~ 
6 for very faint fluxes. For z > 6 the completeness rises again: the 
shape of the bright end of the luminosity function at this redshift is 
sensitive to the choice of the redshift of reionization. 

In summary, the requirement that our samples be at least 80% 
complete restricts the range of validity of the predictions from the 
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Figure 3. Completeness of the Millennium galaxy catalogues with re- 
spect to Lya luminosity or flux. (Top): The minimum luminosity down 
to which the catalogues are 85% complete. (Bottom): The complete- 
ness fraction as a function of redshift for a range of fluxes —19 < 
log(Fi,y a [erg s _1 cm -2 ]) < —17, as indicated by the key. 



Millennium simulation to redshifts below 9, and fluxes brighter 
than log(i<Lya [erg s _1 cm -2 ]) > -17.5. 



4 CLUSTERING PREDICTIONS 

In this section we present clustering predictions using Lya emitters 
in the full Millennium volume. To study the clustering of galaxies 
we calculate the two-point correlation function, <-(r), of the galaxy 
distribution. In order to quantify the evolution of the clustering of 
galaxies, we measure the correlation function over the redshift in- 
terval < z < 9. 

To calculate £(r) in the simulaation, we use the standard esti- 
mator (e.g. Peebles 1980): 



l+«r) 



(DD) 



\N^n£sV[r) 



(1) 



where (DD) stands for the number of distinct data pairs with sep- 
arations in the range r to r + Ar, n is the mean number density 
of galaxies, JV ga i is the total number of galaxies in the simulation 
volume and AV(r) is the volume of a spherical shell of radius r 
and thickness Ar. This estimator is applicable in the case of peri- 
odic boundary conditions. In the correlation function analysis, we 
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Figure 4. The correlation function predicted for Lya emitters (black solid 
curve) for a range of redshifts, as indicated in each panel. Lya emitters are 
included down to the completeness limit at each redshift shown in Fig[l] 
The solid red curve shows the correlation function of the dark matter at 
the same epochs. The blue dashed line shows the power law fit of Eq. (2J> 
evaluated in the range 1 < r [Mpc/h] < 10, as delineated by the vertical 
dashed lines. 



consider two parameters which help us to understand the cluster- 
ing behaviour of Lya galaxies: the correlation length, ro, and the 
galaxy bias, b, both of which are discussed below. 



4.1 Correlation Length evolution 

A common way to characterize the clustering of galaxies is to fit a 
power-law to the correlation function: 



«r) 



(2) 



where ro is the correlation length and 7 = 1.8 gives a good fit to the 
slope of the observed correlation functio n over a restricted range of 
pair separations around ro at 2 = (e.g. Davis & Peebles! ( 119830 ). 
The correlation length can also be defined as the scale where f = 1, 
and quantifies the amplitude of the correlation function when the 
slope 7 is fixed. 

Fig. [4] shows the correlation function of Lya emitting galax- 
ies, £ ga i (solid black curves) of the full catalogues down to the com- 
pleteness limits at each redshift, calculated using Eq. (0. The red 
curve shows ^ m , the correlation function of the dark matter. At 
z — 0, £dm is larger than £ ga j, but for z > tjdm is increasingly 
below £ ga i. We will study in detail the comparison of the dark mat- 
ter and Lya galaxy correlation functions in §4.2. 
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Another notable feature of Fig. [4] is that £ ga i(r) differs con- 
siderably from a power law, particularly on scales greater than 10 
h~ Mpc. When fitting Eq. (O to the correlation functions plot- 
ted in Fig. [4] we use only the measurements in the range [1,10] 
h~ Mpc, where £ ga i(r) behaves most like a power law. We fix the 
slope 7 = 1.8 for all £ ga i(r) to allow a comparison between differ- 
ent redshifts, although we note that for z < 5, the slope of £ ga i(r) 
is closer to 7 = 1.6. By using the power law fit we can compare 
the clustering amplitudes of different galaxy samples. To determine 
the clustering evolution of Lya emitters, we split the catalogues of 
Lya emitters into luminosity bins. For each of these sub-samples, 
we calculate the correlation function and then we obtain ro by fit- 
ting Eq. $2% as described. Fig. \5\ (top) shows the dependence of ro 
on luminosity for different redshifts in the range < z < 9. The 
errors are shown by the area enclosed by the thin solid lines for 
each set of points, and are calculated as the 90% confidence inter- 
val of the x 2 fit °f the correlation functions to Eq. {2]l (ignoring any 
covariance between pair separation bins). The range of luminosities 
plotted is set by the completeness limit of the simulation described 
in the previous section. We also discard galaxy samples with fewer 
than 500 galaxies, as in such cases, the errors are extremely large 
and the correlation functions are poorly defined. The clustering in 
high redshift surveys of Lya emitters is sensitive to the flux limit 
that they are able to reach, as shown by Fig. [5] 

The model predictions show modest evolution of ro with red- 
shift for most of the luminosity range studied. Over this redshift 
interval, on the other hand, the correlation length of the dark mat- 
ter changes dramatically, as shown by Fig. [4] Typically, at a given 
redshift, we find that ro shows little dependence on luminosity un- 
til a luminosity of Z/Lya~ 10 42 [erg s _1 h -2 ] is reached, bright- 
wards of which there is a strong increase in clustering strength 
with luminosity. This trend is even more pronounced at higher 
redshifts. Galaxies at z — are less clustered than galaxies in 
the range 3 < z < 7, except at luminosities close to LL ya ~ 



At z = 8.5, ro increases from ro 



12 ft -1 Mpc at 



10™[ergs -1 lT 

h~ 1 Mpc at L hya ~ 10 42 [ergs _1 h" 2 ] to r 
LL yQ >10 425 [ergs- 1 h- 2 ]. 

The growth of ro with limiting luminosity is related to the 
masses of the haloes which host Lya galaxies. As shown in the 
bottom panel of Fig. [5] there is not a simple relation between the 
median mass of the host halo and the luminosity of Lya emit- 
ters. For a given luminosity, Lya galaxies tend to be hosted by 
haloes of smaller masses as we go to higher redshifts. In addition, 
for redshifts z > 0, there is a trend of more luminous Lya emit- 
ters being found in more massive haloes. The key to explaining the 
trends in clustering strength is to compare how the effective mass of 
the haloes which host Lya emitting galaxies is evolving compared 
to the typical or ch aracteristic mass in the halo distribution (M») 
I IMo & White! 1 1 9961) : if Lya emitters tend to be found in haloes 
more massive than M*, then they will be more strongly clustered 
than the dark matter. This difference between the clustering am- 
plitude of galaxies and mass is explored more in the next section. 
In a hierarchical model for the growth of structures, haloes more 
massive than M* are more clustered, and thus we expect a strong 
connection between the evolution of ro and the masses of the ha- 
los. Fig.|5]shows that the dependence of ro (and host halo mass) on 
luminosity becomes stronger at higher redshifts. 

4.2 The bias factor of Lya emitters 

The galaxy bias, b, quantifies the strength of the clustering of galax- 
ies compared to the clustering of the dark matter. One way to 
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Figure 5. (Top): The evolution of the correlation length ro as a function of 
Lya luminosity for several redshifts in the range < z < 9, as indicated 
by the key. The thin solid coloured lines shows the errors on the correlation 
length. (Bottom): The evolution of the median mass of halos which host 
Lya emitting galaxies as a function of Lya luminosity, for the same range 
of redshifts as above. 



calculate the bias is by taking the ratio of £ ga i and £dm, £ ga i = 
& 2 £dm- Both correlation functions are estimated using Eq. (fTJ. 
Since the simulation contains ten billion dark matter particles, a 
direct pair-count calculation of £dm would demand a prohibitively 
large amount of computer time, so we extract dilute samples of the 
dark matter particles, selecting randomly ~ 10 7 particles. In this 
way we only enlarge the pair-count errors on £dm (which never- 
theless are still much smaller than for £ ga i) but obtain the correct 
amplitude of correlation function itself. 

To obtain the bias parameter of Lya emitters as a function of 
luminosity, we split the full catalogue of galaxies at each redshift 
into luminosity bins. For each of these bins we calculate £ ga i and 
divide by (dm to get the sqaure of the bias. Due to non-linearities, 
the ratio of £ ga j and (dm is not constant on all scales. As a rea- 
sonable estimation of the bias we chose the mean value over the 
range 6 /i _1 Mpc < r < 30 h~ 1 Mpc. Over these scales the bias 
does seem to be constant and in dependent of scale. This range is 
quite similar to the one used by iGao. Springel & White! J2005D to 
measure the bias parameter of dark matter haloes in the Millennium 
Simulation. 

The bias parameter can also be calculated approxi- 



mately using various analytical formalism s (Mo & White; Il99q : 



mately using various analytical formalisms (Mo & White 1990 
ISheth. Mo & Tormerj|200ll ; iMandelbaum et alj|2005t) . These pro 
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Figure 6. The galaxy bias as a function of Lya luminosity at different red- 
shifts, as indicated by the key. The solid lines show the results from the 
simulation and the dashed lines show the analytical expression of SMT. The 
area enclosed by the thin solid lines shows the error on the bias estimation 
for each redshift. 



cedures relate the halo bias to a(m,z), the rms linear mass 
fluctuation within a sphere which on average contains mass m. 
The bias factor for galaxies of a given luminosity is then ob- 
tained b y averaging the halo b ias over the halos hosting these 
galaxies. Le Delliou et al. (2006) used the analytical expression of 
ISheth. Mo & TormerJ feOOlh (hereafter SMT) to calculate the bias 
parameter for the semi-analytical galaxies. This gives a reasonable 
approximati on to the large-scale h alo bias measured in N-body sim- 
ulations (e.g lAngulo et alj fc008h ). 

Fig. [6] shows the bias parameter as a function of luminosity 
for redshifts in the range < z < 9, and compares the direct 
calculation using the N-body simulation (solid lines) with the an- 
alytical estimation (dashed lines). In order to calculate the uncer- 
tainty in our value of the bias, we assu me an error on £ ga i(V) of 
the fo rm A£ ga i = 2^/(1 + £ sal )/DD JHewittll 19821 : iBaugh et all 
1996), and assuming a negligible error in £dm we get 



Ab: 



b£,dn 



& 2 £ d 



DD 



(3) 



for the error in the bias estimation. This error is shown in Fig. [6] 
as the range defined by the thin solid lines surrounding the bias 
measurement shown by the points. 

The first noticeable feature of Fig.|6]is the strong evolution of 
bias with increasing redshift: From z = to z — 8.5 the bias factor 
increases from b(z — 0) ~ 0.8 to b(z = 8.5) ~ 12, which means 
that the clustering amplitude of Lya emitters at z = 8.5 is over 140 
times the clustering amplitude of the dark matter at this redshift. 
Another interesting prediction is the dependence of bias on Lya 
luminosity. For z > 3 there seems to be a strong increase of the 
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Figure 7. The HOD of Lya emitters at z = 3.3 (top) and z = 4.9(bottom). 
Each set of points represents a model sample with a different luminosity 
limit, as given by the key in the upper panel. The dashed line in each panel 
correspond to a "best" fit using the Berlind et al. (2003) parametrization. 



bias with luminosity for bins where LL yQ > 10 42 [erg s _1 h -2 ]. 
The agreement between the analytic calculation of the bias and the 
simulation result is reasonable over the range < z < 5, but be- 
comes less impressive as hi gher biases are reached. A sim ilar dis- 
crepancy was also noticed bv lGao. Springel & White! I I2005I) . where 
they compared the halo bias extra cted from the simula tion with dif- 
ferent analytic formulae (see also lAngulo et alj ([2008)). 

Another way to describe g alaxy clustering is th r ough the halo 
occup at ion distribution (HOD:lBenson et alj ( 120000 . iBerlind et alj 
J2003I) . ICoorav & Shetnl J2002h ). The HOD gives the mean num- 
ber of galaxies which meet a particular observational selection as 
a function of halo mass. For flux-limited samples, the HOD can be 
broken down into the contribution from central galaxies and satel- 
lite galaxies. In a simple picture, the mean number of central galax- 
ies is zero below some threshold halo mass, M m in, and unity for 
higher halo masses. With increasing halo mass, a second thresh- 
old is reached, Merit, above which a halo can also host a satellite 
galaxy. The number of satellites is usually described by a power- 
law of slope j3. In t he simplest case, three parameters are ne eded to 
describe the HOD JBerlind et alJl2002l : lHarnana et alj|2004l) : more 
detailed mode ls have been propos ed to describe the transition from 
to 1 galaxy JBerlind et aU2003l) . 

We can compute the HOD directly from our model. The re- 
sults are shown in Fig. [7] where we plot the HOD at two different 
redshifts for different luminosity limits. For comparison, we plot 
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the HOD parametrization o flBerlind et alj 120031) against our model 
predictions. In general, this HOD does a reasonable job of describ- 
ing the model output, and is certainly preferred over a simple three 
parameter model. However, for the z = 3.3 case (top panel of 
Fig-EJ, the shape of the model HOD for log (M/M ) > 13 is still 
more complicated than can be accommodated by the Berlind et al. 
parametrization, showing a flattening in the number of satellites as 
a function of increasing halo mass. There is less disagreement in 
the z = 4.9 case (bottom panel), but our model HOD becomes 
very noisy for large halo masses. 



5 MOCK CATALOGUES 

In this section we make mock catalogues of Lya emitters for a 
selection of surveys. In the previous section, we used the full sim- 
ulation box to make clustering predictions, exploiting the periodic 
boundary conditions of the computational volume. The simulation 
is so large that it can accommodate many volumes equivalent to 
those sampled by current Lya surveys, allowing us to examine 
the fluctuations in the number of emitters and their clustering. The 
characteristics of the surveys we replicate are listed in Table[T] 
The procedure to build the mock catalogues is the following: 

(i) We extract a catalogue of galaxies from an output of the Mil- 
lennium Simulation that matches (as closely as possible) the red- 
shift of a given survey. The simulation output contains 64 snapshots 
spaced roughly logarithmically in the redshift range [127, 0]. 

(ii) We choose one of the axes (say, the z-axis) as the line-of- 
sight, and we convert it to redshift space, to match what is observed 
in real surveys. To do this we replace r z (the comoving space coor- 
dinate) with 



r z + 



aH(z) 



[ft _1 Mpc] 



(4) 



where v z is the peculiar velocity along the z-axis, a — 1/(1 + z) 
and H(z) is the Hubble parameter at redshift z. 

(iii) We then apply the flux limit of the particular survey, to 
mimic the selection of galaxies. Table Q] shows the flux limits of 
the surveys considered. 

(iv) Then we extract many mock catalogues using the same ge- 
ometry as the real survey. We extract slices of a particular depth 
Az (different for each survey), and within each slice we extract as 
many mock catalogues as possible using the same angular geom- 
etry as the real sample. Az is determined using the transmission 
curves of the narrow-band filters used in each survey. To derive the 
angular sizes we use: 



Dt(9, 



d c (z)A9, 



dJz) = — - / :, 

HO J y/n m {l + Z ') 3 +fl A 



(5) 



(6) 



where Dt is the transverse comoving size in /i -1 Mpc, d c is the 
comoving radial distance, c and Ho are the speed of light and the 
Hubble constant respectively, Sl m and S1a are the density parame- 
ters of matter and the cosmological constant respectively. Eq. (|5} is 
valid for A9 <C 1 [radians], which is the case for the surveys we 
analyse in this work. We assume a flat cosmology. 

(v) From the line-of-sight axis we invert Eq. ® to obtain the 
redshift distribution of Lya galaxies within each mock catalogue, 
converting galaxy position to redshift. This information is then used 
to take into account the shape of the filter transmission curve for 
each survey, which controls the minimum flux and equivalent width 



as a function of redshift. The value given in TableQ]corresponds to 
the minimum flux and EW bs at the peak of the filter transmission 
curve. For redshifts at which the transmission is smaller (the tails 
of the curve) the minimum flux and EW Q bs required for a Lya 
emitter to be included are proportionally bigger. 

(vi) Finally, we allow for incompleteness in the detection of 
Lya emitters at a given flux due to noise in the observed images 
(where this information is available). To do this, we randomly se- 
lect a fraction of galaxies in a given Lya flux bin to match the 
completeness fraction reported for the survey at that flux. 

Real surveys of Lya emitters usually lack detailed informa- 
tion about the position of galaxies along the line-of-sight. Hence, 
instead of measuring the spatial correlation function defined in 
Eq. Q}, it is only possible to estimate the angular correlation func- 
tion, w(8), which is the projection on the sky of £(r). 

We estimate w(ff) from mock catalogues using the follow- 
ing procedure, which closely matches that used in real surveys. 
To compute the angula r correlation function we use the estimator 
dLandv & Szaiavlll993l) : 

(DD(9)} - 2(DR(6)) + (RR(9)} 

WLS (e) ~ (kim) ' ( } 

where (DR) stands for data-random pairs, (RR) indicates the 
number of random-random pairs and all of the pair counts have 
been appropriately normalized. In the case of a finite volume sur- 
vey, this estimator is more robust than the one defined in Eq. Q} 
because it is less sensitive to errors in the mean density of galaxies, 
such as could arise from boundary effects. In practice, the measured 
angular correlation function can be approximated by a power law: 



(8) 



w(6)=A w l- 



where A w is the dimensionless amplitude of the correlation func- 
tion, and S is related to slope of the spatial correlation function, 7, 
from Eq. (|2j by <5 = 7 — l.A relation between ro a nd A w can b e 
obtained using a generalization of Limber's equation (Simon 2007). 
Surveys of Lya emitters typically cover relatively small ar- 
eas of sky and can display significant clustering even on the scale 
of the survey. As a result, the mean galaxy number density within 
the survey area will typically differ from the cosmic mean value. 
If the number of galaxies within the survey is used to estimate the 
mean density, used in Eq. Q, rather than the unknown true un- 
derlying density, this leads to a bias in the estimated correlation 
function. This effect is known as the integral constraint (IC) bias. 
lLandv & Szalavl (1993) show that when their estimator is used, the 
expected value of the estimated correlation function wls(9) is re- 
lated to the true correlation function w(8) by 



{wls{0)) = 



w(9) — wn 



1 +WQ 

where the integral constraint term tun is defined as 



"'<; = 7^5- / dQ 1 dfl2w(8i2) 



(9) 



(10) 



integrating over the survey area, and is equal to the fractional vari- 
ance in number density over that area. 

When the clustering is weak Eq. simplifies to {wls (8)) — 
w{6) — wn. This motivates the additive IC correction which is cus- 
tomarily used in practice: 

W CO rr{8) - WLS (6) + UlQ. (11) 

We use this to correct the angular correlation functions from our 
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Table 1. Summary of survey properties and simulation results. 



(1) 


(2) 


(3) 


(4) 




(5) 


(6) 




(7) 


(8) 


(9) 


(10) 


(ID 


Survey 


^survey 


-^simulation 


Az 


Area 


[arcmin] 


EW ohB 


A] 


-FLycJergs- 1 cm" 2 ] 


AUs 


at median 
mock 


10-90% 


C"u 


MUSYC 


3.1 


3.06 


0.04 




961 


80 




1.5 X 10~ 17 


162 


142 


89-207 


0.41 


SXDS 


3.1 


3.06 


0.06 




3538 


328 




1.1 X 10" 17 


356 


316 


256-379 


0.19 




3.7 


3.58 


0.06 




3474 


282 




2.7 x 10~ 17 


101 


80 


60-110 


0.31 




5.7 


5.72 


0.10 




3722 


335 




7.4 x 10~ 18 


401 


329 


255-407 


0.23 


ELVIS 


8.8 


8.54 


0.10 




-3160 


100 




3.7 x 10~ 18 


- 


20 


14-29 


0.37 



Column (1) gives the name of the survey; (2) and (3) show the redshift of the observations and nearest output from the simulations, respectively; (4) shows 
the redshift width of the survey, based on the FWHM filter width; (5) shows the area covered by each survey; (6) and (7) show the equivalent width and Lya 
flux limits of the samples, respectively; (8) shows the number of galaxies detected in each survey; (9) and (10) show the median of the number of galaxies and 
the 10-90 percentile range found in the mock catalogues for each survey. Finally, column (11) gives the fractional variation of the number of galaxies, defined 
in Eq. {n\ . 



mock catalogues. In order to estimate the term wti, we approximate 
the true correlation function as a power law, as in Eq. (§}, and use 



wn 






(12) 



(IDaddi et alj200(J) . where (RR) are the same random pairs as used 
in the estimate of wls{9). 

To quantify the sample variance expected for a particular sur- 
vey, we use the mock catalogues to calculate a coefficient of vari- 
ance (Cv), which is a measure of the fractional variation in the 
number of galaxies found in the mocks 



i-^v — 



N 90 - Ny 

2-/V mc d 



(13) 



where iVio and iVgo are the 10 and 90 percentiles of the distribution 
of the number of galaxies in the mocks, respectively, and 7V me d is 
the median. The value of C v allows us to compare the sampling 
variance between different surveys in a quantitative way. 

To analyse the clustering in the mock catalogues, we measured 
the angular correlation function of each mock catalogue using the 
procedure explained above. Then we fit Eq. ([8} to each of the mock 
w(6) and we choose the median value of A w as the representative 
power law fit. We fix the slope of w(8) to S = 0.8 for all surveys, 
except for ELVIS, where we found that a steeper slope, S — 1.2, 
agreed much better with the simulated data. To express the variation 
in the correlation function amplitude found in the mocks, we calcu- 
late the 10 and 90 percentiles of the distribution of A w for each set 
of mock surveys. We also calculate w{0) using the full transverse 
extent of the simulation, with the same selection of galaxies as for 
the real survey. This estimate of w(8), which we call the Model 
w(8), represents an ideal measurement of the correlation function 
without boundary effects (so there is no need for the integral con- 
straint correction). 

The surveys we mim i c are the following: the MUSYC Sur- 
vey JGronwall et al.l 120071 ; iGawiser et alj|20071) . which is a large 
sample of Lya emittin g galaxies at z = 3.1; the SXDS Survey 
dOuchi et alj|2005l 120071) . which covers three redshifts: z = 3.1, 
z — 3.7 and z — 5.7, and finally, we make pr edictions for the 
forthcoming ELVIS survey JNilsson et all 12007 al ibi), which is de- 
signed to find Lya emitting galaxies at z = 8.8. We now describe 
the properties of the mock catalogues for each of these surveys in 
turn. 




500 1000 1500 

Observed Equivalent Width (A) 

Figure 8. The observed EW t s distribution of the MUSYC survey at z 
3.1 (solid black line) and the simulation (solid green line). 



5.1 The MUSYC Survey 

The Multi-wavele n gth Survey by Yale-Chile (MUSYC) 
JOuadri et all 120071 : fcawiser et all 120061 |200l iGronwall et ail 
120071) is composed of four fields covering a total solid angle of 
one square degree, each one imaged from the ground in the optical 
and near-infrared. Here we use data from a single MUSYC field 
consisting of narrow-band observations of Lya emitters made 
with the CTIO 4 -m telescope in the E xtended Chandra Deep Field 
South (ECDFS) JGronwall et al .120071) . The MUSYC field, centred 
on redshift z = 3.1, contains 162 Lya emitters in a redshift range 
of Az ~ 0.04 over a rectangular area of 31' x 31' with flux and 
EWobs limits described in TableQ] 

To test how well the model reproduces the Lya emitters seen 
in the MUSYC survey, we first compare the predicted (green) 
and measured (black) distributions of Lya equivalent widths in 
Fig. [8] Here the predicted distribution comes from the full simula- 
tion volume. Overall, the simulation shows remarkably good agree- 
ment with the real data, with a slight underestimation in the range 
200 < EWobs[M < 400.For£W obs [A] > 400 both distributions 
seem to agree well, although the number of detected Lya emitters 
in the tail of the distribution is small. 

For the MUSYC survey we built 252 mock catalogues from 
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Figure 9. An image of a mock catalogue of the MUSYC Survey of Lya 
emitters at z = 3.1. The colour format and legend are the same as used in 
Fig.[T] The angular size of the image is 31' X 31'. 
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Figure 10. Histogram of the number of Lya emitters found in the mock 
MUSYC catalogues. The red line shows the median, the dashed blue lines 
show the 10-90 percentile range, and the green line shows the number of 
galaxies detected in the real survey. 



the Millennium simulation volume using the procedure outlined 
above. Fig. [9] shows an example of one of these mock catalogues. 
Many of the Lya emitters are found in high dark matter density 
regions, and thus they are biased tracers of the dark matter. Fig. 1101 
shows the distribution of the number of galaxies in the ensemble of 
mocks. The green line shows the number detected in the real sur- 
vey (162), which falls within the 10-90 percentile range of the mock 
distribution and is close to the median (142). The 10-90 percentile 
range spans an interval of 89 < JV ga i < 207, indicating a large 
cosmic variance for this survey configuration, with C v = 0.41. 

The next step is to compare the clustering in the simulations 
with the real data. Fig. QT] plots the correlation functions from 
the mock catalogues alongside that measured in the real survey 



co 



10.000 


J ' ' ' ' 




rN. MUSYC w(e) 


1.000 


r - . - v ^--^, Mocks w(0) 


0.100 


: A w = 0.53K^f ' 


fo 






B\ w = 0.29 ±o.; 


X" 


V 






III! 


T 


0.010 








O - 

i 






L 







0.001 


1 , , , 


. 1 





10 



9 [arcmin] 



Figure 11. Angular clustering for the MUSYC S urvey. Green circles show 
w(0) calculated from the observed catalogue iGawiser et al. 2007]). The 
blue circles show the median w(8) from all mock catalogues, corrected for 
the integral constraint effect. The dark and light grey shaded regions respec- 
tively show the 68% and 95% ranges of the distribution of w(8) measured 
in the mock catalogues. The red open circles show the Model correlation 
function, obtained using the width of the entire simulation box (and the 
same EW, flux and redshift limits). The dashed lines show the power-law 
fit to the observed w(8) (green) and the median fit to w(9) from the mock 
catalogues (blue). The amplitudes A w of these fits are also given in the 
figure. 



( IGawiser et alj|2007l) . There is reasonable agreement between the 
mock catalogue results and the observed dataQ The median w(9) 
from the mocks is slightly higher than the observed values, but the 
observed w(8) is within the range containing 95% of the mock 
w(6) values (i.e. between the 2.5% and 97.5% percentiles, shown 
by the light grey shaded region). We quantified this difference by 
fitting the power law of Eq. l[8j to both real and mock data. The 
power-law fits were made over the angular range 1-10 arcmins. 
We find the value of A w (Eq. [8} for each of the mock catalogue 
correlation functions by ^-fitting (using the same expression as 
in £|4,2| for the error on each model datapoint) and then we plot 
the power law corresponding to the median value of A w . We find 
A w = 0.53l 33 for the mocks, where the central value is the me- 
dian, and the range between the error bars contains 95% of the val- 
ues from the mocks. For the real data, we find the best-fit A w and 
the 95% confidence interval around it by \ 2 -fi tting, using the error 
bars on the individual datapoints reported by IGawiser et all This 
gives A w = 0.29 ± 0.17 for the real data. We again see that the 
observed value is within the 95% range of the mocks, and is thus 
statistically consistent with the model prediction. We also see that 
the 95% confidence error bar on the observed A w is much smaller 
than the error bar we find from our mocks. This latter discrep- 
ancy arises from the small errors quoted on w(8) bv lGawiser et al.l 
(2007), which are based on modified Poisson pair count errors, but 
neglect variations between different sample volumes (i.e. cosmic 
variance). On the contrary, using our mocks, we are able to take 
cosmic variance fully into account. This underlines the importance 

1 After this paper was accepted for publication, we learned that the 
MUSYC data shown in FigfTT]are not corrected for the IC. Including this 
correction would improve the agreement with the model. 
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Figure 12. Mock catalogues of the SXDS survey for redshifts 3.1 (top), 3.6 
(centre) and 5.7 (bottom). The colour scheme and legend are the same as 
used previously. The angular size of the image is 1.4° X 1.4°. 



of including the cosmic variance in the error bars on observational 
data, to avoid rejecting models by mistake. 

The red open circles in Fig. [TT] show the correlation function 
obtained using the full angular size attainable with the Millennium 
simulation but keeping the same flux, EW and redshift limits as 
in the MUSYC survey (averaging 7 different slices), and so this 
measurement has a smaller sample variance. The area used here 
is ~ 120 times bigger than the MUSYC area, so IC effects are 
negligible on the scales studied here. We refer to this as the Model 
prediction for w(9). 

The median of the mock correlation functions (including the 
IC correction, blue circles) is seen to agree reasonably well with the 
Model correlation function (red open circles) for 9 < 20[arcmin]. 
This shows that for this survey it is possible to obtain an observa- 
tional estimate of the correlation function which is unbiased over 
a range of scales, by applying the integral constraint correction. 
However, on large scales the median w(9) of the mocks (with IC 
correction included) lies above the Model w(8), which shows that 
the IC correction is not perfect, even on average. Presumably this 
failure is due (at least in part) to the fact that the IC correction 
procedure assumes that w(9)is a power law, while the true w(6) 
departs from a power law on large scales. It is also important to 
note that these statements only apply to the median w(8) derived 
from the mock samples - the individual mocks show a large scatter 
around the true w(9) (as shown by the grey shading), and the IC 
correction does not remove this. This scatter rapidly increases at 
both small and large angular scales, so the best constraints on w(6) 
from this survey are for intermediate scales, 1 < 9 < 5[arcmin]. 

5.2 The SXDS Surveys 

The Subara/XMM-Newton Deep Survey (SXDS) i JOuchi et al.l 
I2005L 120071: iKashikawa et alj|2006l) is a multi-wavelength survey 
covering ~ 1.3 square degrees of the sky. The survey is a combi- 
nation of deep, wide area imaging in the X-ray with XMM-Newton 
and in the optical with the Subaru Suprime-Cam. Here we are in- 
terested in the na rrow-band observa tions at three different redshifts: 
3.1, 3.6 and 5.7 dOuchi et alj2007t) . 

We build mock SXDS catalogues following the same proce- 
dure as outlined above. Fig. [T2] shows examples of our mock cata- 
logues for each redshift. As in the previous case, we see that Lya 
emitters on average trace the higher density regions of the dark mat- 
ter distribution. The real surveys have a well defined angular size. 
However, the area sampled is slightly different at each redshift. In 
order to keep the cross-like shape in our mock catalogues and be 
consistent with the exact area surveyed, we scaled the cross-like 
shape to cover the same angular area as the real survey at each red- 
shift. 

Fig. [T3] shows the distribution of the number of galaxies in 
the mock catalogues for the three redshifts surveyed. The median 
number of galaxies in the mocks at z — 3.1 is 316, which is 
remarkably similar to the observed number, 356. The 10-90 per- 
centile range of the mocks covers 256-379 galaxies. The coeffi- 
cient of variation is C v = 0.19, less than half the value found 
for the MUSYC mock catalogues, C v = 0.41. This reduction is 
due mainly to the larger area sampled by the SXDS survey. In 
the second slice (z = 3.6), the redshift is only slightly higher 
than in the previous case, but the number of galaxies is much 
lower. Looking at the top panel of Fig. [2] we see that the observed 
LFs are basically the same for these two redshifts. The differ- 
ence between the two samples is explained mostly by the differ- 
ent Lya flux limits (1.2 x 10 _17 [erg s _1 cm -2 ] for z = 3.1 and 
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Figure 13. The distribution of the number of galaxies in mock SXDS catalogues, for z = 3.1 (left), z = 3.6 (centre) and z = 5.7 (right). The red line shows 
the median of the number of galaxies inside the mock catalogues, the blue lines show the 10-90 percentiles of the distribution, and the green line shows the 
number observed in the SXDS. 
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Figure 14. Angular correlation functions for the mock SXDS catalogues at z = 3.1 (Left), z = 3.6 (Center) and z = 5.7 (Right). The blue circles show the 
median w(6) from the mock catalogues (after applying the IC correction). The dark and light grey shaded regions respectively show the 68% and 95% ranges 
of the distribution of w(8) measured in the mock catalogues. The red open circles are the Model w(8) calculated using the full simulation width, averaged 
over many slices. The green circles show the observational data from Ouchi et al. The dashed lines show the power-law fit to the observed w(9) (green) and 
the median fit to w(9) from the mock catalogues (blue). The amplitudes A w of these fits are also given in the figure. 



2.6 x lCT 17 [erg s" 1 cm" 2 ] for z = 3.6). For the z - 3.6 mocks, 
we find a median number of 80 and 10-90% range 60-110, in 
reasonable agreement with the observed number of galaxies, 101. 
The fractional variation in the number of galaxies, quantified by 
C v — 0.31, is larger than in the previous case, due to the smaller 
number of galaxies. The z — 5.7 case is similar to the lower red- 
shifts. The median number in the mocks is 329, with 10-90% range 
255—407, again consistent with the observed number, 401. The co- 
efficient of variation for this survey is C v — 0.23, so the sampling 
variance is intermediate between that for the z — 3.1 and z = 3.6 
surveys. 

The angular correlation functions of the mock catalogues are 
compared with the real data in Fig. Q4] The observational data 
shown are preliminary angular correlation function measurements 
in the three SXDS fields, with errorbars based on bootstrap resam- 
pling (M. Ouchi, private communication). As in our comparison 



with the MUSYC survey, we plot the median correlation function 
measured from the mocks, after applying the IC correction, as a 
representative w(9) . As before, we also perform a \ 2 fit of a power 
law to the w(9) measured in each mock, and to the observed val- 
ues, to determine the amplitude A w . The fit is performed over the 
range 1 < 9 < 10 [arcmin] as before. 

The left panel of Fig. [14] shows the correlation functions at 
z — 3.1. According to both the error bars on the observational 
data, and the scatter in w(9) in the mocks (shown by the grey shad- 
ing), this survey provides useful constraints on the clustering for 
1 ;$ S; 10[arcmin], but not for smaller or larger angles, where 
the scatter becomes very large. The fitted amplitude A w for the 
observed correlation function is A w = (0.32 ± 0.22) (95% confi- 
dence, using the error bars reported by Ouchi et al.), somewhat be- 
low the median value found in the mocks, A w = 0.60, but within 
the 95% range for the mocks (A w = 0.23 — 1.35). Based on the 
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Figure 15. An example of a mock catalogue for the ELVIS Survey. The 
image shows the observed field of view (four strips). The legend and colour 
format are the same as in Figsl9land ll2l 



mocks, the model correlation function is consistent with the SXDS 
data at this redshift. 

Comparing these results with those we found for the MUSYC 
survey (which has a very similar redshift and flux limit to SXDS at 
z = 3.1), we see that the results seem very consistent. The MUSYC 
survey has a larger sample variance than SXDS, but the measured 
clustering amplitude is very similar in the two surveys. 

The middle panel of Fig.[T4lshows the correlation function for 
the z = 3.6 survey. In this case, the error bars on the observational 
data and the scatter in the mocks are both larger, due to the lower 
surface density of galaxies in this sample. For the observed correla- 
tion amplitude, we obtain A w — 0.75 ± 0.72, while for the mocks 
we find a median A w = 0.99, with 95% range 0.06-2.01, entirely 
consistent with the observational data. 

The right panel of Fig.[l4]shows the correlation function pre- 
dictions for z = 5.7. According to the spread of mock catalogue 
results, the w(6) measured here is the most accurate of the three 
surveys, due to the large number of galaxies. For the mocks, we 
find a median correlation amplitude A w — 0.82, with 95% range 
0.42-1.49. For the observations, we find A w = 1.56 ± 0.27, if 
we assume a slope 8 — 0.8. The average correlation function in 
the mocks agrees well with this slope over the range fitted, but the 
observational data for w(9) at this redshift prefer a flatter slope. 
The model is however still marginally consistent with the observa- 
tional data at 95% confidence. Similarly flat shape s were also found 
in som e previous surveys dShimazaku et al.ll2004l ; lHavashino et al.l 
120041) in the same field, but at redshifts 3.1 and 4.9 respectively. 
However, these surveys are much smaller in terms of area surveyed 
and number of galaxies (this is particularly so in Shi mazaku et al.l 
(2004)). This behaviour in w(6) might be produced by the high 
density regions associated with protoclusters in the SXDS fields 
(M. Ouchi, private communication), but still this behaviour of w(9) 
must be confirmed to prove that it is a real feature of the correlation 
function. 
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Figure 16. Histogram of the number of galaxies in mock catalogues ex- 
pected for the ELVIS Survey. The red line shows the median of the distri- 
bution, and the blue dashed lines the 10-90 percentiles of the distribution. 



5.3 ELVIS Survey 

One of the main goals of the public surveys planned for the Visible 
and Infrared Survey Telescope for Astronomy (VISTA) is to find 
a significant sample of very high redshift (z ~ 8.8) Lya emitters. 
This program is called the Emissio n-Line galaxies with VISTA sur- 
vey (ELVIS) JNilsson et alj2007al lbl). The plan for ELVIS is to im- 
age four strips of 11.6' x 68.27', covering a total area of 0.878deg 2 , 
as shown in Fig. [15] This configuration is dictated by the layout of 
the VISTA IR camera array. The only current d etections of Lya 
emitters at z > 8 are those of IStark et al.l |2007), which have not 
yet been independently confirmed. Lya emitting galaxies at such 
redshifts will provide us with valuable insights into the reionization 
epoch of the Universe, as well as galaxy formation and evolution. 
For our mock ELVIS catalogues, we select galaxies with 



a minimum flux of i*Lyo 



3.7 x lO^ergs" 1 cm" 2 ] and 



EW ohs > 100 A, as listed in Table Q] (The EW ohs limit is just 
a rough estimate, although our predictions should not be sensitive 
to the exact value chosen.) Fig. [T5] shows the expected spatial dis- 
tribution of z = 8.5 galaxies in one of the ELVIS mock catalogues. 
Each mock catalogue has four strips, matching the configuration 
planned for the real survey. The median number of galaxies within 
the mock catalogues is 20, with a 10-90 percentile spread of 14 to 
29 galaxies, as shown in Fig. [16] The fractional variation in number 
between different mocks is C v = 0.37, which is quite large, but no 
worse than for the MUSYC survey at 2 = 3.1, even though that 
survey has 10 times as many galaxies. 

The angular correlation functions of the mock ELVIS cata- 
logues were calculated in the same way as before (including the in- 
tegral constraint correction). Fig.[T7]shows the median of the w{9) 
values measured from each mock catalogue (blue circles), and also 
the mean (orange circles). In this case, the distribution of w(9) val- 
ues within each angle bin is very skewed, due to the small number 
of galaxies in the mock catalogues, and so the mean and median can 
differ significantly. The dark and light grey shaded regions show 
the ranges containing 68% and 95% of the w{9) values from the 
mocks, from which it can be seen that the cosmic variance for this 
survey is very large. We also show the Model w{9) (red circles), 
which provides our best estimate of the true correlation function 
based on the Millennium simulation, and was calculated by aver- 
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Figure 17. Angular Correlation Functions in the mock catalogues of the 
ELVIS Survey. The blue circles shows the median w(8) from the mock cat- 
alogues, while the orange circles shows the mean. The dark and light grey 
shaded regions respectively show the 68% and 95% ranges of the distri- 
bution of w(8) in the mocks. The red open circles show the Model w(8) 
obtained using the full width of the simulation box. The amplitude and slope 
of the median power-law fit to the mocks are also given. 



aging over 10 slices of the simulation, using the full width of the 
simulation box. Even here, the error bars on w(6) are quite large, 
due to the very low number density of galaxies predicted. We see 
that the mean and median w(6) in the mocks lie close to the Model 
values for 2[arcmin] < 6 < 20[arcmin], so in this sense they 
provide an unbiased estimate. 

The most noticeable feature of Fig.QTJis the large area covered 
by both the 68% and 95% ranges of the distribution of w(9) in 
the mocks, which extend down to w(6) = 0. This indicates that 
the ELVIS survey will only be able to put a weak upper limit on 
the clustering amplitude of z ~ 8.8 Lya emitters, if our model is 
correct. As before, we can quantify this by fitting a power law to 
w(8) in our mocks. We notice that the Model w(9) for this sample 
has a significantly steeper slope, 5 — 1.2, than the canonical value 
5 — 0.8, and so we do our fits to the mocks using S = 1.2. We find 
a median amplitude in the mocks A w = 3.571 335, where the error 
bars give the 95% range. 



6 SUMMARY AND CONCLUSIONS 

We have combined a semi-analytical model of galaxy formation 
with a high resolution, large volume N-body simulation to make 
predictions for the spatial distribution of Lya emitters in a ACDM 
universe. 

Our model for Lya emitters is appealingly simple. Using the 
star formation history predicted for each galaxy from the semi- 
analytical model to compute the production of Lyman continuum 
photons, we find that on adopting a constant escape fraction of Lya 
photons the observed number of Lya emitters can be reproduced 
amazingly well over a range of redshifts (Le Delliou et al. 2006). 
Our modelling of Lya emission may appear overly simplistic on 
first comparison to oth er calculations in the literature. For example, 
iNagamine et al.l (2008) predicted the clustering of Lya emitters in 
a gas-dynamic simulation, modelling the Lya emission through a 



Lya escape fraction or a duty cycle scenario. However, the frac- 
tion of active emitters in the duty cycle scenario needs to be tuned 
at each redshift, for which there is no physical justification. Since 
our predictions for Lya emission are derived from a full model of 
galaxy formation, it is straightforward to extract other properties of 
the emitters, such as their stellar mass or the mass of their host dark 
matter halo (Le Delliou et al. 2006). In this paper we have extended 
this work to include explicit predictions for the spatial and angular 
clustering of Lya emitters. 

We have studied how the clustering strength of Lya emitters 
depends upon their luminosity as a function of redshift. Generally, 
we find that Lya emitters show a weak dependence of clustering 
strength on luminosity, until the brightest luminosities we consider 
are reached. At the present day, Lya emitters display weaker clus- 
tering than the dark matter. This changes dramatically at higher red- 
shifts (z > 3), with currently observable Lya emitters predicted to 
be much more strongly clustered than the dark matter, with the size 
of the bias increasing with redshift. We compared the simulation 
results with analytical estimates of the bias. Whilst the analytical 
results show the same trends as the simulation results, they do not 
match well in detail, and this supports the use of an N-body simu- 
lation to study the clustering. 

A key advantage of using semi-analytical modelling is that 
the evolution of the galaxy population can be readily traced to 
the present day. This gives us some confidence in the star forma- 
tion histories predicted by the model. The semi-analytical model 
passes tests on the predicted distribution of star formation rates 
at high redshift (the number counts and redshifts of galaxies de- 
tected by their sub-millimetre emission and the luminosity function 
of Lyman-break galaxies), whilst also giving a reasonable match 
to the present day galaxy luminosity function (Baugh et al. 2005), 
and also ma tching the observed evolution of galaxies in the infrared 
JLacev et alj|2008h . Gas dynamic simulations as a whole struggle 
to reproduce the present-day galaxy population, due to a combi- 
nation of a limited simulation volume (set by the need to attain a 
particular mass resolution) and a tendency to overproduce massive 
galaxies. The small box size typically employed in gas dynamic 
simulations means that fluctuations on the scale of the box become 
nonlinear at low redshifts, and their evolution can no longer be ac- 
curately modelled. A further consequence of the small box size is 
that predictions for clusterin g are limited to small pair separations 
(e.g. INagamine et alj J2008I) use a box of side 33 /i _1 Mpc, limit- 
ing their clustering predictions to scales r < 3h~ 1 Mpc). By using 
a simulation with a much larger volume than that of any existing 
Lya survey, we can subdivide the simulation box to make many 
mock catalogues to assess the impact of sampling fluctuations (in- 
cluding cosmic variance) on current and future measurements of 
the clustering of Lya emitters. 

We made mock catalogues of Lya emitters to compare with 
the MUSYC (z = 3) and SXDS (z = 3 - 6) surveys, and to make 
predictions for the forthcoming ELVIS survey at z ~ 9. In the 
case of MUSYC and SXDS, we found that the observed number of 
galaxies lies within the 10-90 percentile interval of the number of 
Lya emitters found in the mocks. We find that high-redshift clus- 
tering surveys underestimate their uncertainties significantly if they 
fail to account for cosmic variance in their error budget. Overall, 
the measured angular correlation functions are consistent with the 
model predictions. The clustering results in our mock catalogues 
span a wide range of amplitudes due to the small volumes sampled 
by the surveys, which results in a large cosmic variance. ELVIS 
will survey Lya emitters at very high redshift (z = 8.8). Our pre- 
dictions show that a single pointing will be strongly affected by 
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sample variance, due to the small volume surveyed and the strong 
intrinsic clustering of the Lya emitters which will be detected at 
this redshift. Many ELVIS pointings will be required to get a ro- 
bust clustering measurement. 

We have shown that surveys of Lya emitters can open up a 
new window on the high redshift universe, tracing sites of active 
star formation. With increasing redshift, the environments where 
Lya emitters are found in current and planned surveys become in- 
creasingly unusual, sampling the galaxy formation process in re- 
gions that are likely to be proto-clusters and the progenitors of 
the largest dark matter structures today. Our calculations show that 
with such strong clustering, surveys of Lya emitters covering much 
larger cosmological volumes are needed in order to minimize cos- 
mic variance effects. 
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ADDENDUM TO PUBLISHED VERSION 

As explained in the footnote to section 5.1, after the paper was ac- 
cepted for public ation we learned that the MUSYC datapoints for 
w(8) presented in lGawiser et alj (12007) are in fact not corrected for 
the integral constraint (IC) effect. We therefore show in Fig.ll8lthe 
same comparison of angular correlation functions as was made in 
Fig. QTJ but now without making the IC correction in our mock 
catalgoues. The effect of not including the IC correction in the 
mocks is that there is now much better agreement between w{6) 
in the mocks and the MUSYC observational data. 

Comparing FigQTJwith Fig[T8]we can also directly see the ef- 
fect of the IC correction on w{6). In Fig[T8] the median of the mock 
catalogue w(9) (blue circles) and the Model w(9)(red open circles) 
agree on small scales, but differ on scales larger than 2'. Hence, un- 
less the IC correction is included in the mock catalogues (as is done 
in Fig. lilt we get a biased estimate of the true clustering. In this 
case, as shown in Fig. QjJ the IC correction increases the range of 
agreement with the Model w{9) to scales up to ~ 20'. 
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Figure 18. Angular clustering for the MUSYC S urvey. Green circles show 
w(9) calculated from the observed catalogue iGawiser et al. 2007J). The 
blue circles show the median w(0) from all mock catalogues, with no cor- 
rection for the integral constraint effect. The dark and light grey shaded re- 
gions respectively show the 68% and 95% ranges of the distribution of w(8) 
measured in the mock catalogues. The red open circles show the Model cor- 
relation function, obtained using the width of the entire simulation box (and 
the same EW, flux and redshift limits). The dashed lines show the power- 
law fit to the observed w(9) (green) and the median fit to w(9) from the 
mock catalogues (blue). The amplitudes A w of these fits are also given in 
the figure. 



